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ABSTRACT 

Heterogeneity in genetic networks across different 
signaling molecular contexts can suggest molecular 
regulatory mechanisms. Here we describe a com- 
parative chi-square analysis (CP/^) method, consid- 
erably more flexible and effective than other 
alternatives, to screen large gene expression data 
sets for conserved and differential interactions. 
CPx^ decomposes interactions across conditions 
to assess homogeneity and heterogeneity. 
Theoretically, we prove an asymptotic chi-square 
null distribution for the interaction heterogeneity 
statistic. Empirically, on synthetic yeast cell cycle 
data, CPx^ achieved much higher statistical power 
in detecting differential networks than alternative 
approaches. We applied CP/^ to Drosophila 
melanogaster wing gene expression arrays col- 
lected under normal conditions, and conditions 
with overexpressed E2F and Cabut, two transcrip- 
tion factor complexes that promote ectopic cell 
cycling. The resulting differential networks suggest 
a mechanism by which E2F and Cabut regulate 
distinct gene interactions, while still sharing a 
small core network. Thus, CP/^ is sensitive in de- 
tecting network rewiring, useful in comparing 
related biological systems. 

INTRODUCTION 

Numerous methods have been developed for biological 
network reconstruction, which remains challenging 
owing to data insufficiency (1). Rather than reconstructing 



full networks, a shift has been to identify differential inter- 
action patterns across noisy biological networks (2), as 
they can be linked directly to differences in molecular 
mechanisms. For example, a co-signaling molecule in a 
T cell can interact with more than one hgand or 
counter-receptor and consequently may either stimulate 
or inhibit immunological functions dependent on a 
specific molecular context (3). A majority of methods to 
detect such network rewiring are based on differential cor- 
relation — the difference between gene-gene correlation 
coefficients (4). Generalizing to difference between other 
statistics obtained separately for each condition, the dif- 
ference between 5'-scores, based on a modified ^-statistic, 
was used to identify differential interactions (5). Such a 
difference-between-statistics paradigm, comparing statis- 
tics of patterns but not directly the patterns themselves, 
is either insensitive or prone to noise. Correlation is a 
function of both noise and interaction parameters. 
Unequal noise across conditions can lead to zero differen- 
tial hnear correlation despite distinct slopes (Figure 2). 
This constitutes the insensitivity deficiency of difference- 
between-statistics. On the other extreme, reconstruct-then- 
compare (RTC) (6) — reconstructing interaction patterns 
first, and then comparing the patterns for difference — 
ignores uncertainty in the patterns, and false positives 
tend to arise due to noise. Ouyang et al. (7) overcame 
these problems by characterizing homogeneity and hetero- 
geneity of parametric interaction patterns while also con- 
sidering uncertainty for continuous data. 

To balance between sensitivity to interaction patterns 
and robustness to noise, we present a comparative chi- 
square analysis (CP/^) to hunt for homogeneous and het- 
erogeneous nonparametric interaction patterns from 
discrete data. An interaction is an association from one 
or more parent variables (e.g. transcript quantities of 
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several genes) to a child variable (e.g. another gene's tran- 
script quantity), represented by the generalized truth table 
(gtt) — a discrete nonparametric function mapping parent 
variables to a child variable (8). Nonparametric represen- 
tation enables detection of complex nonhnear interactions, 
thus more flexible than parametric approaches including 
differential correlation (4). A pair of interactions is 
conserved if both have an identical gtt involving the 
same parent and child variables; otherwise, it is defined 
as differential. By decomposing a pair of interactions to 
measure their homogeneity and heterogeneity, we deter- 
mine whether interactions are conserved or differential. 
We show the heterogeneity statistic to be asymptotically 
chi-square distributed. In a simulation study comparing 
two pairs of cell cycle models for the budding and 
fission yeasts, we demonstrate that CPx^ is statistically 
more powerful than RTC. Broadly, CP/^ is applicable 
to systems with qualitative states such as Boolean 
networks and discrete dynamic Bayesian networks for 
comparing interactions under uncertainty. 



MATERIALS AND METHODS 

Comparative chi-square analysis of interactions 

The CPx^ framework is illustrated in Figure 1. The input 
to CPx^ is observations of nodes, e.g. gene expression, in 
networks under two or more conditions (Figure la). We 
assume that the networks, of a same set of nodes, may 
differ in either wiring or strength of interactions. Let 
D\, .. ., Dk be data sets measuring values of nodes in K 
networks. The output is differential or conserved inter- 
actions for each node across the networks (Figure Ic). 
We first create a contingency table Q from /)/,. Each 
row index in a contingency table is a specific combinator- 
ial realization of one or more parent variables. Each 
column index is a specific value the child variable can 
take. The observed pattern in a contingency table repre- 
sents how the parent variables interact with the child 
variable. The chi-square of a contingency table is a dis- 
crepancy measure between the observed and expected 
counts in its cells when parent and child variables are in- 
dependent. The individual interaction strength 
computed from Ca, measures parent-child association 
separately for condition k. Summing up Xa over /c, we 
obtain the total strength x?' ^nd by further breaking 
it into to homogeneity x^ and heterogeneity x^. we estab- 
hsh a decomposition rule central to our framework 
(Figure lb): 

Xi+ ■ ■ ■ = /? = /?+Xrf (1) 

Under the null hypothesis of noninteracting homogen- 
eity across conditions, x? is asymptotically chi-squared 
because it is the sum of independent chi-squares in the 
K conditions (9). x? asymptotically chi-squared, as it 
is computed on a single pooled contingency table. We 
prove that x^ is also chi-squared. By statistical significance 
of these test statistics, differential or conserved inter- 
actions are decided. 



Interaction homogeneity and heterogeneity via 
decomposition 

By three chi-square tests, we assess total strength, strength 
of homogeneity and strength of heterogeneity for inter- 
actions across K conditions. For a node X, or child, of 
Q discrete levels in the networks, we evaluate its hypothet- 
ical parent sets Y\\,...,Yl]c under K different conditions 
via chi-square statistics on contingency tables formed 
between the parents and the child. We first identify the 
smallest super parent set n = Di U . . . U Dyf. Let R be 
the number of combinations of discrete levels in 77. Let 
nyji be the number of observations in entry (/,/) of x Q 
contingency table Q with sample size under condition 
k. We compute K chi-squares with degrees of freedom 
(d.f.) Vk = {R— \){Q — \) to assess the strength of an 
interaction under each condition by 

^ = ttP^^P^, k=l,-,K (2) 
,=1 /=i 

where the expected count in entry (/, /') of Q. is 
J Q R 

nii,k = — XI XI (3) 

q=\ r=\ 

under the null hypotheses that no interaction exists 
between the given parents and child in each condition. If 
both «,y_A and «y-A are zero for a cell, the cell contributes 
zero to x|- Summing up x|'s, we obtain the 'total strength' 
of interaction 

X] = X\+---+X\ (4) 

as our first chi-square statistic, measuring evidence of 
active interactions under 'some' of the K conditions, re- 
gardless of differential or conserved. The null hypothesis is 
that no active interaction exists between any parent sets 
and the child in 'any' condition. Under the null hypoth- 
esis, x7 asymptotically follows a chi-square distribution 
with d.f. V, = v/f and P-value p,. 

To measure the overall agreement of the interactions 
among all K conditions, we develop a homogeneity test. 
Then we fill in an x Q contingency table Cpooi using par- 
ent superset 77 and child values from D\, . . . , Di^. Thus, 
entry (/,/) of Cpooi contains «y = ^^=1 «,7,/i observations. 
We now compute our second x^ statistic as the 'strength of 
homogeneity': 

where the expected count in entry (/, /) of Cpooi is 
J Q R 

A-l 

under the null hypothesis that there is no consistent 
pattern among the interactions between all parent sets 
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Figure 1. Overview of CPx2. (a) Observations are collected for a network in two contexts. Observed trajectories (shown as tables under each 
network) are input to the analysis, (b) By the decomposition rule, after adding individual interaction strengths, we obtain the total strength, x?, of a 
pair of interactions, and decompose it to homogeneity and heterogeneity Xj- The decomposition is applied on every potential pair of interactions. 
A pair of interactions showing the best fit to each condition is chosen for each node based on x? and xl- i^) Interactions showing strong hetero- 
geneity are differential and those showing strong homogeneity but insignificant heterogeneity are conserved. These interactions constitute the output. 
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and the child in all K conditions. Under this null hypoth- 
esis, asymptotically follows a chi-square distribution 
with d.f. = - - 1) and P- value p,. 

To measure the strength of deviation of each interaction 
from the homogeneous component of all interactions, we 
define the 'strength of heterogeneity' by 

/?/ = /? - X? (7) 

as our third statistic, where |x^/| is chi-square 
distributed with d.f. v,/ — v, — Vc and /"-value under 
the null hypothesis that there are no interactions in any 
contingency table. measures differential interactions 
not due to row or column marginal distributions, as ex- 
plained in Supplementary Methods S3.1. The asymptotic 
chi-square distribution of is derived from the following 
theorem: 

Theorem 1. 

Under the null hypothesis of K homogeneous noninter- 
acting R X Q contingency tables, the heterogeneity 

statistic Xrf — J2k=i — X? is asymptotically chi-square 
distributed with {K - 1){R - 1){Q - I) degrees of 
freedom. 

Here is a sketch of the proof: (i) Normalize each con- 
tingency table by subtracting cell means and dividing the 
standard deviation based on a multinomial distribution of 
the ceU counts, (ii) Transform each normalized contin- 
gency table to a matrix of identically and independently 
distributed (i.i.d.) standard normal variables by using row- 
and column-Helmert matrices, (hi) Apply the above two 
steps on the pooled contingency table and obtain a matrix 
of i.i.d. standard normal variables, (iv) Show that in each 
cell the sum of normal variables squared minus the square 
of the pooled normal variable for the same cell is a quad- 
ratic form in the normal variables. We prove this quad- 
ratic form to be chi-square distributed, (v) The 
heterogeneity chi-square can then be represented as the 
sum of these independent chi-square variables in each 
cell, and is thus also chi-square distributed. A complete 
proof is given in Supplementary Methods S3.1. 

Combining Equations (4) and (7), we obtain the 'statis- 
tical decomposition rule for discrete interactions': 

X?+...+xi = XF = X^X^ (8) 
with 

vi+ . . .+Vk = V, ^ v,.+Vd (9) 

which states that the total strength of interactions, as sum- 
mation of strengths of each individual interaction, can be 
decomposed into a strength of homogeneity and a 
strength of heterogeneity. This rule provides the guiding 
principle underpinning the CPx'^ framework. 

Parents in a gene interaction, assumed given so far, are 
often unknown. In our software, the network topology 
can be either externally provided through an open user 
interface or the program can internally learn the 
network topology using various criteria. We can learn 
network topologies by maximizing network conservation 



or differentiation if such preference can be justified in 
advance. Our experience indicates that for networks 
without a prior tendency toward being conserved or dif- 
ferential, a network topology maximizing fitting to the 
data for each condition performed the best as 
demonstrated in our yeast cell cycle simulation study. 
We also allow the network topologies to differ across con- 
ditions but such options are effective only when sufficient 
data are provided to support the increased complexity. 

CPx^ assumed independent two- (or multiple-)sample 
design, where samples are independent in each condition. 
This is often satisfied when each biological individual is 
used exactly once under only one treatment/condition. 

Drosophila wing gene expression data and preprocessing 

Cell cycle exit occurs in the Drosophila wing at 24 h after 
puparium formation (h APF) under normal conditions. 
When E2F or Cabut (Cbt) are overexpressed, wing cells 
go through at least one extra cycle and instead exit the cell 
cycle at 36 h APF (10). We therefore used Niniblegen 
Drosophila expression microarray to study gene expression 
in the fly wing in response to overexpression of Cbt or E2F 
at both the normal exit time, 24 h APF and the delayed 
exit time 36 h APF. RNA sample preparation and data 
normalization are described in Supplementary Methods 
S3. 5. 

To filter out transcripts that were not significantly dif- 
ferentially expressed in the experiments, we used two-way 
analysis of variance on time (24h/36h), condition (E2F+/ 
Cbt+/wild type) and their interaction. This resulted in 
6711 transcripts out of the total 15 473 retained for com- 
parative analysis. To ahgn the analysis with other biolo- 
gical evidence, we compiled a priority list of 4653 
transcripts, from the total 15 473, selected for gene 
ontology terms suggesting roles in controlling gene expres- 
sion, developmentally important signahng pathways or 
functions in cell cycle control. A total of 3768 priority 
transcripts are statistically significantly differentially ex- 
pressed and thus included in the 6711 set. 

Observations of many transcripts are apparently 
Hnearly correlated likely owing to either the small 
sample size (24) for a large number of priority transcripts 
(3768) or truly linearly correlated biological function. To 
avoid favoring by chance anyone of them as a parent to a 
child, we group them into hnearly correlated clusters to 
serve as parents. When an interaction from a parent 
cluster to a child gene is identified, all members in the 
parent cluster are considered candidates to a potential 
biological interaction. By hierarchical variable clustering, 
the 3768 priority transcripts formed 491 groups of linearly 
correlated genes and 34 groups of a single transcript, 
based on 24 observations at time points 24 h APF and 
36 h APF, with four replicates under three conditions. 
As transcripts in a same cluster are either positively or 
negatively hnearly correlated, in quantization to be done 
next, each transcript in the same cluster as parents 
(including those negatively correlated) would lead to 
similar chi-square values for a given child. Thus we 
consider them mathematically equivalent in the context 
of CPx^ and only choose a cluster representative for 
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further analysis. The cluster representative is a transcript 
with largest median correlation coefficients with all other 
transcripts in the same cluster. 

Next, we discretized continuous gene expression data to 
three discrete levels of low, intermediate and high. 
Discretization is achieved by a joint-hkelihood quantiza- 
tion using sequential dynamic programming (11). The 
average estimated noise level is 0.22 over all quantized 
transcripts (Supplementary Figure S3). The maximum 
likehhood estimation of noise level is described in 
Supplementary Methods S3. 3. 

The above preprocessing generates the input to CP/^ 
analysis, including three files of gene expression levels 
under the conditions of E2F+, Cbt+ and the wild type 
control, respectively. Each file contains eight discrete 
samples with value 0, 1 or 2 for each of the 7202 
( = 491+6711) transcripts. Each file also specifies that 
only representatives of the 525 clusters of priority tran- 
scripts can be used as a parent (potential regulator) for a 
child transcript (any of the 7202). 

Highlighting differential gene interaction networks 
in fruit fly wing development 

We performed CPx^ analysis across the three experimental 
conditions E2F+, Cbt+ and the normal wild type. Cbt and 
E2F delay cell cycle exit and cause ectopic cell cycles by 
regulating distinct but largely overlapping sets of genes 
(Supplementary Figure SI). Thus, we hypothesized that 
overexpression of E2F or Cbt gives rise to differential 
gene interactions in reference to the wild type unperturbed 
state. 

In evaluating each potential parent-child relationship, 
the parent candidates were chosen from the priority gene 
clusters, and the potential children include every transcript 
and priority gene cluster. We inspected the parent-child 
relationships at the same time point, at a zero Markovian 
order. The maximum number of parents per child was set 
to 1 as the sample size does not provide a sufficient stat- 
istical power to detect interactions with more parents. We 
did not allow change in parent identity for the same child 
in interactions to anticipate strength change in gene inter- 
actions. All differential interaction P-values were adjusted 
by the Benjamini-Hochberg method (12) to account for 
the multiple testing effect by controlhng the false discov- 
ery rate. 

We obtained a network topology that maximized the fit 
to both E2F+ and Cbt+ data sets, capturing active inter- 
actions in both data sets regardless of conserved or differ- 
ential. Then for each interaction in this active network, we 
classified it into one of three groups: (i) Conserved 
between E2F+ and Cbt+ but differential from control, if 
and only if pci^2¥+ and Cbt+ versus control) < a, 
;7^XE2F+ versus Cbt+) > a and p,{E2¥+ versus Cbt+) 

< a; (ii) Differential between E2F+ and control and dif- 
ferential between E2F+ and Cbt+, if p,^E2¥+, control) 

< a and /j,/(E2F+, Cbt+) < a; and (iii) Differential 
between Cbt+ and control and differential between 
E2F+ and Cbt+, if pdChi^, control) < a and p,iE2¥+, 
Cbt+) < a. All these differential interactions require stat- 
istically significant change in the distribution of each 



involved gene, which we call working zone change as 
detailed in Supplementary Methods S3. 2. 

Motif finding in Drosophila differential gene networks 

For the chosen genes that are differential between E2F+ 
or Cbt+ and the control, sequences upstream of the tran- 
scriptional start site was obtained using the UCSC 
Drosophila Genome Browser (13) or Regulatory 
Sequence Analysis Tools (14). Sequences were entered 
into Multiple EM for Motif Elicitation (MEME) (15) 
and the top five scoring motifs (of widths 6-12 bases) 
were obtained. Using MEME we looked for motifs 
enriched in gene clusters displaying differential inter- 
actions with working zone changes as well as the top 
200 most strongly E2F1 and Cbt co-upregulated genes. 
The rationale was that we could identify motifs specific 
to E2F and Cbt target gene sets that overlap in the co- 
regulated target gene clusters. TOMTOM (16) was used to 
compare the MEME identified motifs to known 
Drosophila motifs. As proof of principle, we were able to 
readily identify two distinct E2F binding sites. On exam- 
ination of Cbt regulated genes, we identified a novel 
Drosophila Mad-like motif (Supplementary Figure S2). 

RESULTS 

Sensitivity of CP/l to interaction heterogeneity over 
alternative approaches 

We first evaluated the sensitivity of CPx^ to interaction 
heterogeneity over differential correlation and RTC. 

In several conceptual examples shown in Figure 2, the 
differential correlation method can be completely insensi- 
tive to some truly heterogeneous interaction patterns 
because each pair of patterns has identical correlation 
coefficients. 

RTC is an intuitive alternative for comparing inter- 
actions. We illustrate it with the generalized logical 
network reconstruction algorithm we developed previ- 
ously based on chi-square testing (8). Using the same 
basic chi-square statistic enables a fair experiment to 
study interaction comparison strategies. RTC first recon- 
structs a gtt for each node using parents with a smallest 
/"-value of /[ for the first network, and generates in iso- 
lation another gtt based on for the second network. 
Then it compares the difference between each pair of re- 
constructed gtts to declare a conserved or differential 
interaction. An interaction is conserved if its gtts are the 
same across two conditions and at least one gtt is signifi- 
cant (_P- value < a, a false-positive threshold). An inter- 
action is differential if the two gtts are different and at 
least one is significant. If both are insignificant, the inter- 
action is inactive or null. Such direct gtt comparison 
ignores data uncertainty. 

A second set of examples in Figure 3 illustrates a 
decisive advantage of CPy^ in sensitivity to interaction 
heterogeneity over differential correlation and RTC at a 
small sample size. We created a pair of conserved and four 
pairs of differential Boolean interactions. Each inter- 
action, with two parents and one child, forms a 4-bit 
truth table. The four pairs of differential interactions 
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Figure 2. Conceptual limitations of differential correlation: (a) anti- 
correlation, (b) shift, (c) reflection and (d) nonlinear interaction 
patterns. Only anti-correlation in (a) can be detected by differential 
correlation, while CPx~ detected all four differential interactions, (a) 
Anti-correlation. Detectable by differential correlation: 1.0 — (—1.0) = 
2 # 0, and by CP/: pj = 4.9e-6. (b) Shift. Undetectable by differential 
correlation: -0.87 - (-0.87) = 0. Detectable by CPr: = 0.0050. (c) 
Reflection. Undetectable by differential correlation: 
-0.80 - (-0.80) = 0. Detectable by CPx^: Pd = 0.0060. (d) Nonlinear. 



Undetectable by differential correlation: 0 - 
Pd = 5.0e-5. 



0 = 0. Detectable by CPx 



have increasing heterogeneity from 1 to 4 bits in their truth 
tables. With these 10 truth tables, we simulated data sets of 
a small sample size 8 at the noise level of 0.2 using a noise 
model defined in Supplementary Methods S3. 3. Both the 
sample size and the noise level of 0.2 are consistent with the 
Drosophila gene expression data set (Supplementary Figure 
S3). Then, we applied the three methods on the simulated 
data sets. The receiver operating characteristic (ROC) 
curves and area under ROC curves (AUCs) are qualitative 
and quantitative indicators of the performance. Figure 3 
shows that the sensitivity of CP/^ becomes progressively 
pronounced as interaction heterogeneity increases and is 
maximized when the truth tables differ the most at 4 bits: 
the gain of CP/^ in AUC is remarkably 31% over differ- 
ential correlation or 55% over RTC. 

Benchmarking robustness to noise on yeast cell cycle 
networks 

We benchmarked the performance of CP/^ on comparing 
two pairs of gene networks in budding and fission yeast, 
respectively, against RTC and differential correlation, 
using ROC curves at four noise levels (Figure 4). The 
two pairs of cell cycle gene networks are plotted in 
Supplementary Figures S6 and S8 and the corresponding 
generalized logic rules are described in Supplementary 
Figure S7, SB, SIO and Sll. The first pair of budding 



yeast models (17,18) is similar in network topology; the 
second pair of fission yeast models (18,19) differs consid- 
erably in both network topology and logic. Altogether 
there are 13 differential and 7 conserved interactions in 
the two pairs. From each model, we simulated a number 
of trajectories, each lasting 2-13 time points, to cover all 
states of the networks. Then we added various levels of 
independent random noise to each gene in every state of 
each trajectory using the noise model defined in 
Supplementary Equation (S28). The noise does not 
modify the length of the trajectory. The trajectory pairs 
are input to CPx^ to obtain differential and conserved 
interactions. 

In Figure 4, we define a true positive as a pair of true 
differential interactions declared as such involving no false 
parents. A false positive is a pair of true nondifferential 
interactions declared as differential. A true negative is a 
pair of true nondifferential interactions declared as such. 
A false negative is a pair of true differential interactions 
declared either with incorrect parents or as nondifferen- 
tial. Here, nondifferential refers to either conserved or null 
interactions. At each noise level, we collected accumulated 
results against the groundtruth. Then we plotted ROC 
curves for detecting differential interactions. The 
increase in AUC from RTC or differential correlation to 
CP/^ is evident at the noise levels of 0.2 and 0.25, consist- 
ent with what we encountered in biological data. 
Specifically, CPy^ improved the AUC by ~5.5% from 
differential correlation and by ~ 13-25% from RTC. 
Therefore, CP/^ is more robust to noise in detecting dif- 
ferential interactions than its alternatives. Full detail of 
the yeast cell cycle simulation study is provided in 
Supplementary Methods S3. 4. 

Cbt regulates distinct and overlapping gene interactions 
with E2F in cell cycle 

We then extended CP/^ to examine in vivo genetic inter- 
actions in response to the ectopic expression of two 
transcription factors that promote cell prohferation in 
the wings of Drosophila melanogaster . The Drosophila 
wing is used to study cell cycle control because it is 
highly homogeneous and normally undergoes a well- 
characterized naturally synchronous cell cycle exit to 
become permanently postmitotic during metamorphosis 
(10,20,21). Consistent with its role in promoting the cell 
cycle, the E2F complex is a well-established target for 
negative regulation by tumor suppressor proteins such as 
Retinoblastoma (22) and is positively regulated by onco- 
genes such as SV40 Large T and Adenovirus El A (23). We 
have found the E2F complex to regulate the expression of 
a number of cell cycle regulators, chromatin modifiers and 
other factors comprising the 'E2F transcriptional 
program' in the fly wing (24). Activation of the E2F 
complex can delay the process of cell cycle exit and 
cause ectopic cycling in the wing by promoting the expres- 
sion of hundreds of cell cycle regulators, chromatin modi- 
fiers and other factors (24). Surprisingly, we have recently 
found that overexpression of another, unrelated zinc 
finger transcription factor Cbt (25-27), not previously 
known to play a role in cell cycle regulation, can also 
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Figure 3. Sensitivity to interaction heterogeneity. CP/^ stiows decisive advantage in sensitivity to heterogeneity on data sets with sample size (8) and 
noise level (0.2) consistent with the Drosophila gene expression data set. The true positives are based on four pairs of differential Boolean interactions 
with two parents, and the false positives are based on one pair of conserved interactions. As interaction heterogeneity increases from 1 (top left), 2 
(top right), 3 (lower left) to 4 (lower right) bits, the improved performance of CP/^ in AUC contrasts sharply with the either stagnant or 
deteriorating performance of differential correlation or RTC. 



delay cell cycle exit and cause ectopic cycling. We have 
thus applied CPx' to detect differential genetic inter- 
actions that might mediate the overlapping, yet distinct 
transcriptional outputs to these two transcription factors. 

In addition to their many shared transcriptional targets, 
Cbt and E2F also regulate a distinct nonoverlapping 
group of transcripts (Supplementary Figure SI) and 
have differing effects on the level of cell proliferation, 
tissue patterning and apoptosis in the wing. Thus 
comparing responses to their overexpression provides an 
ideal opportunity to examine both conserved and differ- 
ential interactions in vivo. We applied CPx^ on the corres- 
ponding expression array data collected with 
overexpression of E2F (E2F+), Cbt (Cbt+) and the 
normal wild type (control). We found that E2F+ and 
Cbt+ are associated with different sets of differential 
gene interactions from the control, albeit sharing a small 
portion involved in promoting proliferation. Specifically, 



we identified 1 1 1 unique differential interactions in E2F+ 
versus the control (Figure 5a), 14 differential interactions 
from the control but conserved between the E2F+ and 
Cbt+ conditions (Figure 5b), and 4 unique differential 
interactions in Cbt+ versus control (Figure 5c). 

BioGRID (28) searches confirmed five predicted inter- 
actions (CG3008 ^Ebi, CG8247 ^Dah, Ntf-2 
^CG6084, CG9938 tos and sub ^ncd) and eight 
genes (DREF, CycA, brm, dap, Ebi, CGI 3900, Rbf2 
and CGI 3 806) known to interact with E2F. These 13 
interactions, marked with dashed lines in Figure 5, are 
discussed for their biological function in Supplementary 
Table SI. An evaluation of the evidence suggests that they 
underpin a network of genes for proliferation by acting 
cooperatively to promote S-phase and mitosis in response 
to ectopic E2F or Cbt activity. Figure 5 also predicted 
parent-child interactions for genes that do not have any 
known interactions within BioGRID. Importantly, the 14 
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Figure 4. Robustness to noise in comparative analysis of two pairs of yeast cell cycle models. Data were simulated from the four yeast cell cycle 
models at increasing noise levels (0: no noise, 0.5: maximum possible noise). CP/^ again performs better in AUC than differential correlation or RTC 
at the intermediate noise levels of 0.2 and 0.25, most consistent with what was observed in Drosophila gene expression data. When noise is at 0.35, 
their distinction nearly diminishes. Here, ROC curves become flat and cannot reach a true positive rate of 1 owing to a no-false-parent requirement. 



gene interactions shared by E2F and Cbt (green nodes in 
Figure 5b) were conspicuous within this group, suggesting 
a potential coherent core network modulated to promote 
prohferation (Supplementary Results). Interestingly, our 
analysis revealed novel interactions that suggest a role 
for RIO kinases in modulating the function of a transcrip- 
tional repressor Ebi, on cell cycle genes (29). We also un- 
covered several negative cell cycle regulatory loops 
predicted to limit prohferation that are uniquely engaged 
when E2F is activated, but not when Cbt is activated. This 
is consistent with our previous research demonstrating 
that E2F, when aberrantly active, also induces robust 
cell cycle negative-feedback mechanisms to limit 
abnormal prohferation (24). 

To seek further support that the regulatory role of Cbt 
is distinct from E2F, we identified a novel Mad-like motif 
(Supplementary Figure S2) in Cbt-regulated and Cbt/E2F 



co-regulated genes, but not enriched in E2F-only 
regulated genes. It is striking that this novel motif has 
such a strong similarity to the Mad binding motif 
(£■— value < 3.4 x 10~^), as Cbt and its closest mammalian 
homolog, KLFIO or TIEGl, are known to impinge on the 
transforming growth factor (3 (TGF-P) signaling pathway 
that converges on the Mad transcription factor (25,27,30). 
One possibihty is that Cbt may bind the identified Mad- 
like site directly to regulate gene transcription, or it may 
interact with a DNA binding partner, such as Mad, to 
regulate target gene expression. 

DISCUSSION 

E2F and Cbt regulate a largely overlapping, yet distinct, 
set of cell cycle genes (Supplementary Figure SI). The 
newly discovered function of Cbt as a cell cycle regulator 
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Figure 5. Differential gene networks detected when proliferation is promoted in Drosophila wings by two perturbed transcription factors E2F and 
Cbt. Adjusted /7,;-values for each detected differential interaction are marked on corresponding edges. All gene nodes are differentially expressed. Blue 
nodes are not children in any significant differential interactions detected, but are parents in other significant differential interactions. Dashed lines 
are known gene interactions obtained from BioGRID. (a) Unique significant differential interactions (dark tan) due to overexpression of E2F. (b) 
Consistent significant differential interactions (green) due to overexpression of E2F or Cbt. (c) Unique significant differential interactions (red) due to 
overexpression of Cbt. 



potentially provides cells with a mechanism for E2F- 
independent control of cell cycle genes. Cbt is a member 
of the highly conserved specificity protein/Kriippel-Hke 
factor (SP/KLF) family of transcription factors 
(25,26,31). The ability of Cbt to induce ectopic cell prolif- 
eration suggests that it could have oncogenic function. 
However, the most immediate mammalian homologs of 



Cbt, KLFIO and KLFll (members of the TIEG family) 
are known primarily as cell cycle repressors (32). 
In mammals, KLFIO and KLFll are expressed rapidly 
following induction of TGF-P signaling and function as 
effectors of TGF-P signaling (30,33-39) with overexpres- 
sion recapitulating TGF-P-induced cell cycle exit 
(30,36,39,40). In contrast, in Drosophila the TGF-P 
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family member Dpp plays a well-known role in promoting 
proliferation and growth in the developing Drosophila 
wing (41) and Cbt has been shown to act positively 
on Dpp-signahng in this context (27). In addition, 
ectopic activity of other members of the SP/KLF fam- 
ily has been hnked to a variety of cancerous phenotypes 
(42-45). 

The Cbt-associated motif we identified (Supplementary 
Figure S2) is present in the promoters of many Cbt and 
E2F co-regulated genes, as well as in Cbt-only regulated 
genes. The sequence of the putative Cbt motif is consistent 
with known DNA-binding data for Drosophila Cbt as well 
as mammahan homologs, which bind GC-rich promoter 
sequences (46,47). Additionally, this motif resembles a 
Mad-like motif and Cbt was recently shown to enhance 
transcriptional activation of direct Dpp target genes (27). 
Importantly, recent work has suggested that Drosophila 
Cbt acts primarily as a transcriptional repressor (48), 
which runs counter to our simplest hypothesis that Cbt 
directly binds this motif to activate genes induced on 
Cbt overexpression. However, we cannot rule out the pos- 
sibihty that Cbt acts indirectly, perhaps via repression of 
another factor, acting on this motif Further work 
exploring these relationships between Cbt, the cell cycle 
and the TGF-P signahng pathway may help elucidate a 
new relationship between developmental signahng 
pathways and cell cycle control. 

The computational complexity of CPx is linear in both 
the number of conditions and the number of edges in the 
network, if network topology is given. If network 
topology must be learned from the data, the computa- 
tional complexity increases to be hnear in the number of 
conditions, polynomial in the number of nodes and expo- 
nential in the maximum number of parents per node. 
Exact fast chi-square algorithms exist for binary variables 
with two parents (49). The implementation of CP/^ 
already supports parallel computing using the Message 
Passing Interface protocol (50). In future biological ex- 
perimental design, where two or more genes are simultan- 
eously disrupted in a network of thousands of genes, fast 
and probably approximate implementation of CP/^ will 
be necessary. 

The CPx^ method has profound implications for 
analyzing biological networks. Making minimal assump- 
tions about underlying mechanisms, discrete 
nonparametric contingency tables are preferable in those 
systems without known parametric forms of interactions. 
It strikes a balance between differential correlation that 
irreversibly compresses interaction patterns and the 
noise-prone RTC, and offers practical benefits beyond 
existing differential co-expression methods suggested by 
our benchmarking. The usefulness of CP/'^ is 
demonstrated here through identifying heterogeneous 
gene interaction patterns between E2F and Cbt transcrip- 
tion factors in regulating the cell cycle. Applicable to 
assays where multiple molecules are measured across mo- 
lecular contexts, CPx^ thus has the potential to underscore 
diversity in molecular mechanisms implicating complex 
interaction patterns in differential network biology. 
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